The analysis examines trends in Business Analytics, Data Science, and Machine Learning job postings, with a focus on the skills required for these roles. The study evaluates how varying skill combinations influence salary levels, remote work availability, and career progression pathways.
This analysis employs three main approaches: (1) KMeans clustering to segment jobs based on skill requirements, (2) regression models to predict salary based on skills and experience, and (3) classification models to identify Business Analysis, Data Science, and Machine Learning roles from other positions. The models use 25 technical skills as features along with experience and remote work indicators. Results show that experience is the dominant salary driver, jobs cluster into 6 distinct groups with different compensation and remote work patterns, and BA/ML/DS roles have clearly identifiable skill signatures.
2 Data Loading and Setup
The analysis starts by loading the Lightcast job postings dataset and identifying relevant skill columns. The dataset contains comprehensive information about job postings including titles, salaries, required skills, and other job characteristics.
Code
import pandas as pdimport numpy as npimport plotly.express as pximport plotly.graph_objects as gofrom plotly.subplots import make_subplotsimport plotly.io as pioimport jsonimport refrom collections import Counterpio.templates.default ="plotly_white"pio.renderers.default ="notebook"# Load data from csvdf = pd.read_csv("data/lightcast_job_postings.csv", low_memory=False)print(f"Dataset loaded: {len(df):,} rows, {len(df.columns)} columns")# print(df.head())
Dataset loaded: 72,498 rows, 131 columns
2.1 Important Skills columns
The dataset contains multiple skill-related columns. After examining the schema, the columns ‘SKILLS_NAME’, ‘SOFTWARE_SKILLS_NAME’ and ‘SPECIALIZED_SKILLS_NAME’ provide the most detailed skill information for this analysis. These columns list the specific technical skills mentioned in each job posting.
3 Skills Data Preprocessing
The next step involves filtering the data to include only records with valid salary and title information. Then, binary features are created for 25 key technical skills covering ML, Data Science, and Business Analytics domains to enable machine learning analysis.
Code
# Apply filtersdf_filtered = df.dropna(subset=['SALARY', 'TITLE'])# Convert salary to numeric and filterdf_filtered['SALARY'] = pd.to_numeric(df_filtered['SALARY'], errors='coerce')df_filtered = df_filtered[df_filtered['SALARY'] >0]print(f"Records after filtering: {len(df_filtered):,}")df_skills = df_filtered.copy()# Focus on key Business Analytics/ML/Data Science skills. Key skills for# BA/ML/DS roles identified manually.key_skills = ['Python (Programming Language)','R (Programming Language)','SQL (Programming Language)','Machine Learning','Data Science','Data Analysis','Statistics','Artificial Intelligence','TensorFlow','PyTorch (Machine Learning Library)','Pandas (Python Package)','NumPy (Python Package)','Scikit-Learn (Python Package)','Big Data','Apache Spark','Apache Hadoop','Amazon Web Services','Microsoft Azure','Google Cloud Platform (Gcp)','Data Visualization','Tableau (Business Intelligence Software)','Power BI','Natural Language Processing (NLP)','Computer Vision','Deep Learning' ]print(f"Using focused {len(key_skills)} BA/ML/DS technical skills for analysis")# Create binary features for each key skill.for skill in key_skills:# Clean skill name for column naming# Eg: R (Programming Language) --> has_r_programming_language skill_col_name =f'has_{skill.lower().replace(" ", "_").replace("-", "_").replace("(", "").replace(")", "")}' df_skills[skill_col_name] = ( df_skills['SKILLS_NAME'].str.contains(skill, case=False, na=False, regex=False) | df_skills['SOFTWARE_SKILLS_NAME'].str.contains(skill, case=False, na=False, regex=False) | df_skills['SPECIALIZED_SKILLS_NAME'].str.contains(skill, case=False, na=False, regex=False) ).astype(int)print("Binary skill features created")# Create ML/DS role indicator using focused skillscore_ml_skills = ['has_machine_learning', 'has_artificial_intelligence', 'has_tensorflow', 'has_pytorch_machine_learning_library','has_deep_learning', 'has_natural_language_processing_nlp', 'has_computer_vision']core_ds_skills = ['has_python_programming_language', 'has_r_programming_language', 'has_statistics','has_data_science', 'has_pandas_python_package', 'has_numpy_python_package','has_scikit_learn_python_package', 'has_big_data']core_ba_skills = ['has_data_analysis', 'has_data_visualization', 'has_sql_programming_language','has_tableau_business_intelligence_software', 'has_power_bi']# Role indicators# ML roles are straightforward.df_skills['is_ml_role'] = ( (df_skills[core_ml_skills].sum(axis=1) >0)).astype(int)# R language is primarily associated with Data Science field. So,# if job requires R language or if it has more than one data science# skills then it is considered DS role.df_skills['is_ds_role'] = ( df_skills['has_r_programming_language'] ==1| (df_skills[core_ds_skills].sum(axis=1) >1)).astype(int)# Business Analytics roles typically require SQL, visualization tools (Tableau, Power BI)# and data analysis capabilities. If job has more than two BA skills, consider it a BA role.df_skills['is_ba_role'] = ( df_skills[core_ba_skills].sum(axis=1) >=2).astype(int)# Remote work indicatordf_skills['is_remote'] = df_skills['REMOTE_TYPE'].fillna(0).astype(int)df_skills['experience_years'] = df_skills['MIN_YEARS_EXPERIENCE'].fillna(0)df_final = df_skillsprint(f"Final dataset size: {len(df_final):,}")print(f"ML roles identified: {df_final['is_ml_role'].sum():,}")print(f"Data Science roles identified: {df_final['is_ds_role'].sum():,}")print(f"Business Analytics roles identified: {df_final['is_ba_role'].sum():,}")print(f"BA/ML/DS combined: {((df_final['is_ml_role'] ==1) | (df_final['is_ds_role'] ==1) | (df_final['is_ba_role'] ==1)).sum():,}")
Records after filtering: 30,808
Using focused 25 BA/ML/DS technical skills for analysis
Binary skill features created
Final dataset size: 30,808
ML roles identified: 3,226
Data Science roles identified: 2,877
Business Analytics roles identified: 10,831
BA/ML/DS combined: 12,821
For each of the 25 key skills, a binary indicator variable is created (1 if the skill is mentioned, 0 otherwise). This transforms the text skill data into numerical features suitable for machine learning models.
3.1 Role Classification Logic
Three role categories are identified based on technical skills:
ML roles: Require advanced ML/AI skills like TensorFlow, PyTorch, Deep Learning, NLP, Computer Vision
Data Science roles: Require R programming, Python with Statistics, or multiple data science tools (Pandas, NumPy, Scikit-learn)
Business Analytics roles: Require SQL, data analysis, visualization tools (Tableau, Power BI), typically 2+ BA skills
The analysis examines how these specialized skills impact salary and career opportunities. Machine learning models are used to find patterns that can guide job seekers in choosing which skills to develop.
4 Feature Engineering for ML
Before building models, the dataset is prepared by selecting relevant columns. This includes the salary (target variable), skill indicators, remote work status, and experience years.
Code
# Just prepare the modeling datasetmodeling_cols = ['SALARY', 'is_ml_role', 'is_ds_role', 'is_ba_role', 'is_remote', 'experience_years'] +\ [col for col in df_final.columns if col.startswith('has_')]df_modeling = df_final[modeling_cols].copy()print("Features for modeling:")print(f"Dataset shape: {df_modeling.shape}")print(f"Columns: {list(df_modeling.columns)}")print(f"Missing values: {df_modeling.isnull().sum().sum()}")
The modeling dataset now contains binary skill features, experience, remote work indicator, and salary information. This structured format allows application of various machine learning techniques.
5 Unsupervised Learning: KMeans Clustering Based on Skills
The first machine learning approach uses KMeans clustering to discover natural groupings in the job market. This unsupervised technique groups jobs with similar skill profiles together, without using salary information. The goal is to see if jobs naturally segment into distinct categories based on their requirements.
Code
from sklearn.cluster import KMeansfrom sklearn.preprocessing import StandardScaler, LabelEncoderfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.ensemble import RandomForestRegressor, RandomForestClassifierfrom sklearn.linear_model import LogisticRegressionfrom sklearn.metrics import mean_squared_error, r2_score, accuracy_score, f1_score, confusion_matrix, classification_report# Prepare features for clustering using skills and other featuresskill_feature_cols = [col for col in df_modeling.columns if col.startswith('has_')]print(f"Available skill features: {len(skill_feature_cols)}")# Base clustering featuresclustering_features = skill_feature_cols + ['experience_years', 'is_remote']# Encode ONET and NAICS6.le_onet = LabelEncoder()df_modeling['onet_encoded'] = le_onet.fit_transform(df_final['ONET'].fillna('Unknown'))clustering_features.append('onet_encoded')le_naics = LabelEncoder()df_modeling['naics_encoded'] = le_naics.fit_transform(df_final['NAICS6'].fillna('Unknown'))clustering_features.append('naics_encoded')# Prepare clustering dataX_cluster = df_modeling[clustering_features].fillna(0)# Scale featuresscaler_cluster = StandardScaler()X_cluster_scaled = scaler_cluster.fit_transform(X_cluster)# KMeans clusteringkmeans = KMeans(n_clusters=6, random_state=42, n_init=10)clusters = kmeans.fit_predict(X_cluster_scaled)df_modeling['cluster'] = clusters# print("Skills based clustering completed")# print("Cluster centers:")# for i, center in enumerate(kmeans.cluster_centers_):# print(f"Cluster {i}: {center}")
Available skill features: 25
The clustering model groups similar jobs together using skill patterns, experience requirements, and job characteristics. The algorithm assigns each job to one of 6 clusters. Now the characteristics of each cluster can be examined to understand what makes them distinct.
Code
# Analyze clustering.cluster_summary = df_modeling.groupby('cluster').agg({'SALARY': ['count', 'mean'],'is_ml_role': 'mean','is_ds_role': 'mean','is_ba_role': 'mean','is_remote': 'mean','experience_years': 'mean'}).round(2)cluster_summary.columns = ['count', 'avg_salary', 'ml_role_pct', 'ds_role_pct', 'ba_role_pct','remote_percentage', 'avg_experience']cluster_summary = cluster_summary.reset_index()# Compute combined BA/ML/DS percentage on-the-fly# A job has BA/ML/DS if it has any of the three role typescluster_summary['ml_ds_ba_combined_pct'] = cluster_summary.apply(lambda row: ((df_modeling[df_modeling['cluster'] == row['cluster']][['is_ml_role', 'is_ds_role', 'is_ba_role']].sum(axis=1) >0).mean()), axis=1).round(2)print("Skills based Cluster Summary:")print(cluster_summary)# Visualize cluster characteristics.fig = make_subplots( rows=2, cols=3, subplot_titles=('Cluster Size', 'Average Salary', 'BA/ML/DS Role %','Remote Work %', 'Avg Experience', 'Salary Distribution'), specs=[[{"type": "bar"}, {"type": "bar"}, {"type": "bar"}], [{"type": "bar"}, {"type": "bar"}, {"type": "scatter"}]])fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['count'], name="Count"), row=1, col=1)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['avg_salary'], name="Avg Salary"), row=1, col=2)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['ml_role_pct'], name="ML %"), row=1, col=3)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['ds_role_pct'], name="DS %"), row=1, col=3)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['ba_role_pct'], name="BA %"), row=1, col=3)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['remote_percentage'], name="Remote %"), row=2, col=1)fig.add_trace(go.Bar(x=cluster_summary['cluster'], y=cluster_summary['avg_experience'], name="Experience"), row=2, col=2)# Salary distribution by cluster.fig.add_trace( go.Scatter( x=df_modeling['cluster'], y=df_modeling['SALARY'], mode='markers', opacity=0.6, name="Jobs" ), row=2, col=3)fig.update_layout( height=650, showlegend=False, template="plotly_white", title={'text': "Skills-Based KMeans Clustering Results",'y': 0.98,'x': 0.5,'xanchor': 'center','yanchor': 'top' }, margin=dict(t=80))fig.show()
The clustering analysis grouped jobs based on their skill requirements and characteristics. The analysis identified 6 distinct job clusters, each with different salary levels, remote work availability, and skill profiles.
Key Findings:
Business Analytics dominates: 10,831 BA roles vs. 3,226 ML and 2,877 DS
BA-focused growth: Cluster 3 ($109K) — strong BA demand with DS hybrid edge
Specialist track: Cluster 4 ($140K) — pure ML, fewer jobs but high pay
Hybrid advantage: Cluster 0 ($140K) and Cluster 5 ($118K, 56% remote) — multi-skill roles with flexibility
6 Supervised Learning: Multiple Regression
The second approach uses supervised learning to predict salary based on skills and experience. Two regression models are trained: Linear Regression and Random Forest. This analysis identifies which skills and factors most strongly influence compensation.
Code
# Identify regression features.# Focus on skills (not role labels) to understand how skills directly affect salaryregression_features = skill_feature_cols + ['experience_years', 'is_remote']# Prepare regression data using salary as the target variableX_reg = df_modeling[regression_features].fillna(0)y_reg = df_modeling['SALARY']X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)print(f"Training set size: {len(X_train):,}")print(f"Test set size: {len(X_test):,}")# Scale featuresscaler_reg = StandardScaler()X_train_scaled = scaler_reg.fit_transform(X_train)X_test_scaled = scaler_reg.transform(X_test)# Multiple Linear Regressionlr = LinearRegression()lr.fit(X_train_scaled, y_train)# Random Forest Regressionrf_reg = RandomForestRegressor(n_estimators=100, random_state=42)rf_reg.fit(X_train_scaled, y_train)print("Skills based regression models training completed")
Training set size: 24,646
Test set size: 6,162
Skills based regression models training completed
Both models are trained on 80% of the data and will be evaluated on the remaining 20% test set. The Random Forest model can capture non-linear relationships and interactions between skills, while Linear Regression provides a baseline for comparison.
Code
# Evaluate regression models# Linear Regression predictionsy_pred_lr = lr.predict(X_test_scaled)rmse_lr = np.sqrt(mean_squared_error(y_test, y_pred_lr))r2_lr = r2_score(y_test, y_pred_lr)# Random Forest predictionsy_pred_rf = rf_reg.predict(X_test_scaled)rmse_rf = np.sqrt(mean_squared_error(y_test, y_pred_rf))r2_rf = r2_score(y_test, y_pred_rf)print("Skills-based Regression Model Performance:")print(f"Linear Regression - RMSE: ${rmse_lr:,.2f}, R²: {r2_lr:.4f}")print(f"Random Forest - RMSE: ${rmse_rf:,.2f}, R²: {r2_rf:.4f}")# Feature importance for Random Forest# Only use features that actually exist in the modelactual_feature_names = [col for col in regression_features if col in X_train.columns]importances = rf_reg.feature_importances_# Visualize feature importancefig = px.bar(x=actual_feature_names, y=importances, title="Skills Impact on Salary (Random Forest Feature Importance)", labels={'x': 'Features', 'y': 'Importance'})fig.update_layout(template="plotly_white", xaxis_tickangle=-45)fig.show()# Top skills by salary impactskill_importance =list(zip(actual_feature_names, importances))skill_importance.sort(key=lambda x: x[1], reverse=True)print("\nTop skills by salary impact:")for skill, importance in skill_importance[:10]:print(f"{skill}: {importance:.4f}")
Skills-based Regression Model Performance:
Linear Regression - RMSE: $37,899.01, R²: 0.2780
Random Forest - RMSE: $32,558.54, R²: 0.4672
Prediction models were built to understand how skills influence salary. The Random Forest model achieved R2 of 0.47 compared to 0.28 for Linear Regression, showing that skill-salary relationships are complex.
Model Performance:
Random Forest: R² = 0.47 (explains 47% of salary variation), RMSE = $32,559
Linear Regression: R² = 0.28
Insight: Skills alone do not fully explain salary — other factors also matter.
Key Salary Drivers (Feature Importance):
Experience (0.49): Largest factor, nearly half of salary variation
Remote work (0.07): Flexibility influences pay differences
Data Analysis (0.04): Core analytical capability
Tableau (0.04): Visualization and BI tool
AWS (0.04): Cloud computing platform
SQL (0.04): Database querying and manipulation
Statistics (0.03): Analytical foundation
Python (0.03): Programming language
Career Implications:
Experience is critical — the strongest driver of salary.
Remote work adds value — flexibility can boost compensation.
Skill combinations matter — technical, analytical, and cloud skills together shape salary outcomes.
Summary: Salary is not determined by skills alone. Experience and work flexibility are key, while technical skills provide additional differentiation.
7 Supervised Learning: Classification to Identify BA/ML/DS Roles
Although the project required only one of the supervised learning models. This analysis also explores the classification to distinguish ML/Data Science roles from Business Analytics and other positions. A Random Forest Classifier is trained to predict whether a job is an ML/DS role based on its skill requirements. This analysis reveals which skills are the strongest “signature” indicators that distinguish ML/DS positions from BA roles.
Code
# Prepare features for classification.classification_features = skill_feature_cols + ['experience_years', 'is_remote']# Prepare classification dataX_clf = df_modeling[classification_features].fillna(0)# Target: ML/DS roles (computed from is_ml_role OR is_ds_role)y_clf = ((df_modeling['is_ml_role'] ==1) | (df_modeling['is_ds_role'] ==1)).astype(int)# Train/test split for classificationX_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42)# Scale featuresscaler_clf = StandardScaler()X_train_clf_scaled = scaler_clf.fit_transform(X_train_clf)X_test_clf_scaled = scaler_clf.transform(X_test_clf)# Random Forest Classificationrf_clf = RandomForestClassifier(n_estimators=100, random_state=42)rf_clf.fit(X_train_clf_scaled, y_train_clf)print("Skills-based classification model trained successfully!")
Skills-based classification model trained successfully!
The classifier learns patterns that distinguish ML/DS roles from BA and other positions based on their skill profiles. The model is now evaluated to see how accurately it can identify these specialized ML/DS roles versus the more common BA positions.
Code
# Random Forest predictionsy_pred_rf_clf = rf_clf.predict(X_test_clf_scaled)accuracy_rf = accuracy_score(y_test_clf, y_pred_rf_clf)f1_rf = f1_score(y_test_clf, y_pred_rf_clf)print("Skills based Classification Model Performance:")print(f"Random Forest - Accuracy: {accuracy_rf:.4f}, F1 Score: {f1_rf:.4f}")# Confusion Matrix for Random Forestcm = confusion_matrix(y_test_clf, y_pred_rf_clf)# Visualize confusion matrixfig = px.imshow(cm, text_auto=True, aspect="auto", title="Confusion Matrix - ML/DS Role Classification", labels=dict(x="Predicted", y="Actual"), color_continuous_scale="Blues")fig.update_layout(template="plotly_white")fig.update_xaxes(tickvals=[0,1], ticktext=['Not ML/DS', 'ML/DS'])fig.update_yaxes(tickvals=[0,1], ticktext=['Not ML/DS', 'ML/DS'])fig.show()print("Classification Report:")print(classification_report(y_test_clf, y_pred_rf_clf))# Only use features that actually exist in the classification modelclf_actual_feature_names = [col for col in classification_features if col in X_train_clf.columns]clf_importances = rf_clf.feature_importances_# Visualize classification feature importancefig = px.bar(x=clf_actual_feature_names, y=clf_importances, title="Skills Impact on ML/Data Science Role Classification", labels={'x': 'Features', 'y': 'Importance'})fig.update_layout(template="plotly_white", xaxis_tickangle=-45)fig.show()
Skills based Classification Model Performance:
Random Forest - Accuracy: 0.9995, F1 Score: 0.9986
A Random Forest classifier was used to predict whether a job is an ML/Data Science role based on its skill requirements. The model achieved very strong performance in separating ML/DS roles from Business Analytics and other positions.
Model Performance:
Accuracy: 99.95% — nearly all ML/DS roles correctly identified
Insight: ML/DS roles have distinct skill patterns compared to BA and general analyst jobs
Conclusion: Skill-based criteria effectively distinguish ML/DS roles from BA positions
Key Predictive Skills (Feature Importance)
Programming: Python, R
ML Frameworks: TensorFlow, PyTorch
Statistical Modeling: Core differentiator for ML/DS
BA-Oriented Skills: SQL, Tableau, Power BI, Data Analysis (more common in BA roles)
Career Implications
Distinct skill sets: ML/DS roles require clearly different capabilities than BA roles
ML/DS focus: Programming, modeling, and ML frameworks are the strongest signals
BA focus: SQL, visualization, and reporting tools dominate BA roles
Career development: Building expertise in high-importance ML/DS features directly improves readiness for ML/DS positions
Summary:The Random Forest classifier confirms that ML/DS roles are defined by specialized technical skills, while BA roles emphasize analysis and visualization tools. This distinction provides a clear roadmap for professionals aiming to transition into ML/DS careers.
8 Model Results Visualization
Code
# Summarize core model performancemodel_summary = pd.DataFrame({'Model': ['Linear Regression', 'Random Forest (Regression)', 'Random Forest (Classifaction)'],'R² / Accuracy': [r2_lr, r2_rf, accuracy_rf],'RMSE / F1 Score': [rmse_lr, rmse_rf, f1_rf]})print(model_summary)# Visualization of model resultsfig = make_subplots( rows=1, cols=2, subplot_titles=('Model Performance Comparison', 'Skills vs Salary Impact'), specs=[[{"type": "bar"}, {"type": "bar"}]])# Model performance comparisonmodels = ['Linear Regression', 'Random Forest Regression', 'Random Forest Classification']metrics = [r2_lr, r2_rf, accuracy_rf]fig.add_trace(go.Bar(x=models, y=metrics, name="Performance"), row=1, col=1)# Skills vs salary impacttop_skills_salary = skill_importance[:8]fig.add_trace(go.Bar(x=[s[0] for s in top_skills_salary], y=[s[1] for s in top_skills_salary], name="Salary Impact"), row=1, col=2)fig.update_layout( height=450, showlegend=False, template="plotly_white", title={'text': "Core Model Results - ML/Data Science Skills Analysis",'y': 0.98,'x': 0.5,'xanchor': 'center','yanchor': 'top', }, margin=dict(t=80))fig.show()
Model R² / Accuracy RMSE / F1 Score
0 Linear Regression 0.278032 37899.005358
1 Random Forest (Regression) 0.467166 32558.537199
2 Random Forest (Classifaction) 0.999513 0.998609
9 Key Takeaways and Recommendations
9.1 Summary of Findings
Our analysis of business analytics, data science and machine learning job postings reveals several important patterns:
Skill-Based Job Segmentation: Jobs cluster into 6 distinct groups. Cluster 4 (pure ML/DS) pays $140K with only 77 positions, while Cluster 1 (10,189 jobs) pays $145K with mixed roles. Remote work availability varies from 25% to 56% across clusters.
Salary Drivers: Experience dominates (49% importance) followed by remote work capability (7%). Technical skills like Tableau, AWS, SQL and Python each contribute 3-4%. The R² of 0.47 shows skills explain about half of salary variation.
Role Differentiation: ML/DS roles have distinct skill patterns, achieving 100% classification accuracy. This indicates these specialized positions require clearly different capabilities than analyst roles.
9.2 Recommendations for Job Seekers
For Career Advancement: - Gain experience - it’s the single biggest salary driver (49% importance) - Develop remote work capabilities - adds 7% to salary potential - Learn practical tools: Tableau, AWS, SQL are each worth 3-4% salary impact - Not only ML/DS titles can be considered as Cluster 1 (non-ML) pays $145K vs Cluster 4 (pure ML) at $140K
For Transitioning to ML/Data Science: - The 100% classification accuracy shows these roles need very specific skill combinations - Focus on the specialized skills shown in the classification importance chart - Note: ML/DS specialization has fewer opportunities (Cluster 4 has only 77 jobs)
For Maximizing Opportunities: - For remote work: Target Cluster 5 skills (56% remote, 63% ML/DS roles) - For job volume: Cluster 1 has most opportunities (10,189 jobs) at highest pay ($145K) - For specialization: Cluster 4 is pure ML/DS but limited opportunities (77 jobs)
9.3 Limitations and Considerations
The analysis is based on job posting data which may not reflect actual hiring outcomes
Skill requirements in job posts may differ from day-to-day job responsibilities
Market conditions and geographic factors also influence salaries beyond just skills
The models identify patterns but don’t capture all nuances of career success